视频亮点检测长期以来一直是计算机视觉任务中的主题,挖掘出未接触的原始视频输入的用户出现剪辑。但是,在大多数情况下,这一研究中的主流方法建立在封闭的世界假设上,在封闭的世界假设中,固定数量的突出显示类别是提前正确定义的,并且需要同时可用的所有培训数据,并且作为一个结果,相对于突出显示类别和数据集大小的可伸缩性差。为了解决上面提到的问题,我们提出了一个视频突出显示检测器,能够逐步学习,即\ textbf {g} lobal \ textbf {p} rototype \ textbf {e} ncoding(gpe),捕获新定义的视频亮点。通过其相应的原型扩展数据集。除此之外,我们提供了一个注释且昂贵的数据集,称为\ emph {Bytefood},包括超过5.1k的美食视频属于\ emph {cooke},\ emph {eat},\ emph {food Material},\ emph {cooke},和\ emph {演示}。据我们所知,这是第一次将增量学习设置引入视频突出显示检测,从而减轻培训视频输入的负担,并促进了按数据集的大小成比例的传统神经网络的可扩展性和域的数量。此外,所提出的GPE超过了\ emph {Bytefood}上的当前增量学习方法,至少报告了1.57 \%MAP的改善。代码和数据集将更早提供。
translated by 谷歌翻译
最近,对深度学习进行了广泛的研究,以加速动态磁共振(MR)成像,并取得了令人鼓舞的进步。但是,如果没有完全采样的参考数据进行培训,当前方法可能在恢复细节或结构方面具有有限的能力。为了应对这一挑战,本文提出了一个自我监督的协作学习框架(SelfCollearn),以从无效的K-Space数据中进行准确的动态MR图像重建。拟议的框架配备了三个重要组成部分,即双网络协作学习,重新启动数据增强和专门设计的共同培训损失。该框架可以灵活地与数据驱动的网络和基于模型的迭代未滚动网络集成。我们的方法已在体内数据集上进行了评估,并将其与四种最新方法进行了比较。结果表明,我们的方法具有很强的能力,可以从无效的K空间数据捕获直接重建的基本和固有表示形式,因此可以实现高质量且快速的动态MR成像。
translated by 谷歌翻译
机器学习系统,尤其是基于深度学习的方法,在实验设置下的现代计算机视觉任务中享有巨大成功。通常,这些经典的深度学习方法建立在\ emph {i.i.d。}假设上,假设训练和测试数据是独立且相同的相同分布绘制的。但是,在现实世界中,通常无法获得上述\ emph {i.i.d。}的假设,因此导致深度学习算法的急剧性能衰减。在此背后,域转移是要责备的主要因素之一。为了解决此问题,我们建议使用\ textbf {po} tient \ textbf {e} nergy \ textbf {r} anking(poer)将对象功能和域特征(\ emph {i.e.e。在给定的图像中,促进对标签 - 歧义特征的学习,同时滤除对象与背景之间的无关相关性。 POER帮助神经网络捕获与标签相关的功能,这些功能首先包含域信息,然后逐渐逐渐蒸发标签 - 歧义表示形式,从而强制执行神经网络,以了解对象和背景的特征,这对物体和背景至关重要生成域不变特征。 Poer报告了域泛化基准的卓越性能,与现有方法相比,平均TOP-1的准确性至少提高了1.20 \%。此外,我们在ECCV 2022 NICO Challenge \ footNote {https://nicochallenge.com}中使用POER,仅使用Vanilla Resnet-18获得顶级。该代码已在https://github.com/foreverps/poer上提供。
translated by 谷歌翻译
本文重点介绍了用神经网络检测分配(OOD)样本的问题。在图像识别任务,训练过的分类往往给人高置信度的远离中分布(ID)数据输入图像,这大大限制了它在现实世界中的应用。为了减轻这个问题,我们提出了一个基于GaN的边界意识分类器(GBAC),用于生成仅包含大多数ID数据的关闭超空间。我们的方法基于传统的神经网分离特征空间作为几个不适合于ood检测的未闭合区域。与GBAC作为辅助模块,封闭的超空间分布以外的OOD数据将具有低得多的分数被分配,允许更有效的检测OOD同时维持分级性能。此外,我们提出了一种快速采样方法,用于产生躺在预先提及的闭合空间的边界上的硬度陈述。在几个数据集和神经网络架构上采取的实验承诺GBAC的有效性。
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
Object instance segmentation is a key challenge for indoor robots navigating cluttered environments with many small objects. Limitations in 3D sensing capabilities often make it difficult to detect every possible object. While deep learning approaches may be effective for this problem, manually annotating 3D data for supervised learning is time-consuming. In this work, we explore zero-shot instance segmentation (ZSIS) from RGB-D data to identify unseen objects in a semantic category-agnostic manner. We introduce a zero-shot split for Tabletop Objects Dataset (TOD-Z) to enable this study and present a method that uses annotated objects to learn the ``objectness'' of pixels and generalize to unseen object categories in cluttered indoor environments. Our method, SupeRGB-D, groups pixels into small patches based on geometric cues and learns to merge the patches in a deep agglomerative clustering fashion. SupeRGB-D outperforms existing baselines on unseen objects while achieving similar performance on seen objects. Additionally, it is extremely lightweight (0.4 MB memory requirement) and suitable for mobile and robotic applications. The dataset split and code will be made publicly available upon acceptance.
translated by 谷歌翻译
Modern telecom systems are monitored with performance and system logs from multiple application layers and components. Detecting anomalous events from these logs is key to identify security breaches, resource over-utilization, critical/fatal errors, etc. Current supervised log anomaly detection frameworks tend to perform poorly on new types or signatures of anomalies with few or unseen samples in the training data. In this work, we propose a meta-learning-based log anomaly detection framework (LogAnMeta) for detecting anomalies from sequence of log events with few samples. LoganMeta train a hybrid few-shot classifier in an episodic manner. The experimental results demonstrate the efficacy of our proposed method
translated by 谷歌翻译
We present SODA: the first publicly available, million-scale high-quality social dialogue dataset. Using SODA, we train COSMO: a generalizable conversation agent outperforming previous best-performing agents on both in- and out-of-domain datasets. In contrast to most existing crowdsourced, small-scale dialogue corpora, we distill 1.5M socially-grounded dialogues from a pre-trained language model (InstructGPT; Ouyang et al., 2022). Dialogues are distilled by contextualizing social commonsense knowledge from a knowledge graph (Atomic10x; West et al., 2022). Human evaluation shows that dialogues in SODA are more consistent, specific, and (surprisingly) natural than prior human-authored datasets - e.g., DailyDialog (Li et al., 2017), BlendedSkillTalk (Smith et al., 2020). In addition, extensive evaluations show that COSMO is significantly more natural and consistent on unseen datasets than best-performing dialogue models - e.g., GODEL (Peng et al., 2022), BlenderBot (Roller et al., 2021), DialoGPT (Zhang et al., 2020). Furthermore, it is sometimes even preferred to the original human-written gold responses. We make our data, models, and code public.
translated by 谷歌翻译
We propose a novel task, G4C (Goal-driven Guidance Generation in Grounded Communication), for studying goal-driven and grounded natural language interactions. Specifically, we choose Dungeons and Dragons (D&D) -- a role-playing game consisting of multiple player characters and a Dungeon Master (DM) who collaborate to achieve a set of goals that are beneficial to the players -- as a testbed for this task. Here, each of the player characters is a student, with their own personas and abilities, and the DM is the teacher, an arbitrator of the rules of the world and responsible for assisting and guiding the students towards a global goal. We propose a theory-of-mind-inspired methodology for training such a DM with reinforcement learning (RL), where a DM: (1) learns to predict how the players will react to its utterances using a dataset of D&D dialogue transcripts; and (2) uses this prediction as a reward function providing feedback on how effective these utterances are at guiding the players towards a goal. Human and automated evaluations show that a DM trained with RL to generate guidance by incorporating a theory-of-mind of the players significantly improves the players' ability to achieve goals grounded in their shared world.
translated by 谷歌翻译